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ABSTRACT 



Methods and systems are provided for supporting keyword 
searches of data items in a structured database, such as a 
relational database. Selected data items are retrieved using 
an SQL query or other mechanism. The retrieved data values 
are documented using a markup language such as HTML. 
The documents are indexed using a web crawler or other 
indexing agent. Data items may be selected for indexing by 
identifying them in a data dictionary. The indexing agent 
produces an index that associates keywords with resource 
locators such as URLs, hot links, file paths, or distinguished 
names. After a user provides a keyword to a search engine 
interface, the index is used to obtain a resource locator that 
is associated with the keyword. The resource locator is used 
to retrieve the item's current data from the structured data- 
base. A document containing the retrieved data is then 
generated and provided to the user. 

45 Claims, 3 Drawing Sheets 
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KEYWORD SEARCHES OF STRUCTURED 
DATABASES 

FIELD OF THE INVENTION 

The present invention relates to information management 5 
and retrieval in a digital system, and more particularly to the 
use of keyword indexes for retrieving data both from struc- 
tured databases such as relational databases and from textual 
documents such as web pages. 

10 

TECHNICAL BACKGROUND OF THE 
INVENTION 

Information is stored digitally in a wide variety of 
formats, which arc accessed with a bewildering assortment 
of retrieval operations. As computers containing digital 
information are increasingly connected with one another, the 
differences between different information stores become 
more evident and more frustrating. Thus, many approaches 
have been proposed or implemented to make information 
more widely available. 

Vast amounts of information are stored by corporations, 
government agencies, and other entities in structured 
databases, of which the most widely used are relational 
databases. In a typical relational database, individual pieces 25 
of data such as names, addresses, prices, and part numbers 
are stored in rows and columns designated by headings and 
organized into tables or other relations. The smallest unit of 
manipulation is an individual database record holding one 
(or perhaps a few) data values. 3Q 

Indexes into the data records and tables are generated and 
maintained internally by database management software to 
make record accesses more efficient. Each database has its 
own set of indexes. The indexes are updated whenever a 
record's value is changed, or in some cases at periodic 35 
intervals. In some relational databases, all records are 
indexed; in others, indexes are created only after the number 
of records or the importance of particular records passes a 
threshold or another efficiency criterion is met. In many 
relational (and other) databases only primary database key 40 
values are indexed; other data values are retrieved by way of 
the keys and the relationships defined between key values 
and other (secondary) values. Information about the data 
values is provided through a database query language. The 
various dialects of the SQL language are among the most 45 
widely used query languages. 

Enormous amounts of information are also stored in 
textual documents using markup languages such as HTML, 
XML, and other variations on SGML. Markup language 
document stores differ from relational databases in several 50 
important ways. The smallest unit of retrieval is typically an 
entire "page" (which may actually print as several pages). 
Each page typically contains many more words or numbers 
than a relational database record. The pages are not orga- 
nized into tables or other relations, but are instead connected 55 
by hyperlinks or hot links. Pages may also be grouped in a 
file system by directory placement and/or file naming con- 
ventions. 

Web crawlers and other network-roaming agents index 
the pages at sporadic intervals. After a given page is posted 60 
to the network, considerable time may pass before an agent 
encounters and indexes the page. A given index often points 
to information at numerous sites. The same page may be 
indexed in different ways by different agents. Sometimes alt 
the words in a page are indexed, but more often selected 65 
words are indexed. Since the indexed words are selected by 
the web page author, they do not always impartially and 
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accurately summarize the page's contents. The indexes are 
used by keyword search engines that provide users with an 
interface that is substantially simpler, but also less powerful, 
than typical SQL interfaces. 

Much useful information is also stored in word processor 
textual documents, such as *.doc, *.pdf, *.ps, *.rtf, *.txt, and 
other documents. Word-processed document repositories 
and their associated document management systems are 
similar to web sites and to relational databases in some 
ways, and different in others. Some repositories are orga- 
nized only by placing documents in particular directories in 
a file system hierarchy; no indexing is provided to speed 
searches. Other repositories index their documents accord- 
ing to the entire text of each document in the repository, but 
indexing is more commonly based on selected keywords 
provided by the document's author or by a human or 
automated subject matter classifier. Each repository has its 
own set of indexes. The user interface may support cither a 
keyword search of the documents or an SQL-like query of 
an associated structured database of document keywords, 
authors, dates, titles, and similar data. 

Unfortunately, the differences between these various 
information storage and retrieval approaches makes it dif- 
ficult to provide a single interface that gives users access to 
information from all available digital sources. The attempts 
to bridge differences between different sources of informa- 
tion are almost as varied as the sources themselves, and fully 
comprehensive indexes are not available. 

One approach to increasing information availability 
involves "dynamic HTML." An SQL query embedded in an 
HTML web page is extracted by a web server, sent to a 
relational database query handler, and processed in conven- 
tional manner by the relational database management sys- 
tem. The results of the query are placed in HTML format and 
returned to the user. This system strikes a balance between 
SQL's flexibility and SQL's complexity by deciding what 
queries are available, expressing them in natural language in 
the web page, and writing them in SQL ahead of time for the 
user. However, users who do a keyword search using a web 
browser or intranet search engine will not necessarily dis- 
cover that the relational database contains relevant 
information, even if the keywords searched are among the 
data that would have been retrieved by the dynamic HTML 
query, because the web crawler index is based on the text of 
the dynamic HTML page, not on the relational data. 

Another approach uses a natural language front-end to 
translate an English sentence into an SQL query which is 
then processed in conventional manner. The system provides 
greater flexibility than dynamic HTML, allowing users to 
write questions in a natural language and then translating the 
questions into SQL queries (sometimes with varying degrees 
of success). As with dynamic HTML, however, users who do 
a keyword search using a browser or search engine will not 
necessarily discover relevant information even if the key- 
words searched are among the data that would have been 
retrieved by an SQL query. The keyword search results 
might not even direct users to the natural language front-end. 

Accordingly, another approach proceeds as follows. The 
column or table heading names and relationship names used 
in the database are extracted from a data dictionary that 
defines the relational database's structure. Selected data 
values are added, and then synonyms of all these terms are 
added, creating a list of "magnet terms." The magnet terms 
are placed in a web "magnet page" that also has an SQL 
query interface. The magnet terms will be indexed by a web 
crawler, so that users who do keyword searches using the 
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magnet terms are directed to the magnet page and its SQL 
query interface. 

The magnet page query interface may be a dynamic 
HTML interface, with prewritten SQL queries accompanied 
by explanatory text. The query interface may also be a 
natural language interface configured to receive English 
questions and translate them into SQL queries. Or the query 
interface may simply accept SQL queries and pass them to 
the database management software. Of course, the query 
interface may also combine dynamic HTML, natural lan- 
guage translation, and straightforward SQL querying capa- 
bilities. 

In any case, a SQL query from the query interface is 
directed to the relational database, which uses its internal 
indexes to retrieve the data. The results are packaged as 
HTML and displayed to the user. This approach has the 
advantage that if their keywords are among the magnet 
terms, then users who do a keyword search will be directed 
to the magnet page for the relational database containing the 
relevant information. However, users will usually not reach 
the query interface unless the data they seek appears in the 
magnet terms. Moreover, even if they do reach the query 
interface they must still find or formulate an SQL query that 
will retrieve the relevant information from the database. 

Instead of attempting to make relational database infor- 
mation available to web browsers, a different approach tries 
to make web pages accessible through a relational database 
interface. Text documents such as plain text files, HTML 
pages, word processor documents, and the like are entered as 
records in a relational database. Keywords or the full text of 
the documents are entered in the database's internal indexes 
to support document retrieval through the database query 
interface using SQL or another query language. 

This approach has the advantage of bringing powerful and 
we 11 -understood relational database software to bear on the 
problem of retrieving relevant text documents. But users 
who browse a network on which the relational database 
occupies only one or a few nodes will not necessarily realize 
that the information they seek resides in documents indexed 
into the database in question, even if the keywords they use 
in their browsing appear in the document indexes. The 
indexes are internal to the database and thus are used only 
in response to SQL or like queries directed specifically at the 
database. 

Other approaches are also described in the literature 
and/or embodied in software currently being used. For 
instance, structured databases other than relational databases 
are sometimes used, including hierarchical, object- 
relational, object-oriented, and other structured databases. 
Also, at least one web crawler now indexes word processor 
documents as well as markup language documents. But the 
examples above illustrate several important characteristics 
of different approaches to publishing information: 

the smallest unit of data retrieved (e.g., database record, 
web page); 

the rules used to organize data (e.g., relations, file place- 
ment and naming conventions, hyperlinks); 

how data is retrieved (e.g., SQL queries, keyword 
searches); 

what data is indexed for each data unit (e.g., headings, 
primary database keys, author-defined keywords, 
selected keywords, full text); 

where the indexes reside (e.g., within the database system 
or outside it); 

which sources arc indexed (e.g., the records of a given 
database, the web sites visited by the crawler); and 
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when the index is updated (e.g., when the record is 
entered or modified, periodically, when the crawler 
visits the site). 
When existing approaches are viewed in the manner 
5 discussed above, it becomes apparent that improvements arc 
possible. For instance, it would be an advancement in the art 
to make structured database information visible to net-wide 
keyword searches when a user has not yet identified the 
database in question as one likely to contain relevant infor- 
10 ma lion. 

It would be an additional advancement to provide such a 
method and system which do not interfere with existing 
retrieval mechanisms, but serve instead as additional tools 
for identifying and retrieving information based on key- 
is words. 

Such a method and system are disclosed and claimed 
herein. 

BRIEF SUMMARY OF THE INVENTION 

20 The present invention provides a method and system for 
supporting keyword searches of data items in a structured 
database, such as a relational database. One method of the 
invention begins with selection of at least one data item in 
the structured database; each selected item contains data and 

25 has a corresponding location identifier which identifies the 
item's location within the structured database. For instance, 
a relational database record may be identified by an object 
class name and one or more primary database key values. 

3Q The selected data items are documented by creating at 
least one document, such as a web page, which resides 
outside the structured database as a memory stream or as a 
file and which contains a textual representation of each 
selected item's data. The documents are then indexed by 

35 creating an index outside the database which associates 
keywords in the textual representation of each selected 
item's data with that item's location identifier. The indexed 
keywords are more comprehensive and accurate than terms 
used in conventional magnet pages or web page meta 

4Q content tags because they are generated directly from most 
or all of the data values. 

If the structured database includes data items organized as 
records in relations according to a data dictionary, then 
selection may be accomplished by providing a supplemental 

45 data dictionary which identifies the selected records or 
tables. In this case, the indexing step only indexes records 
and tables that are identified by the supplemental data 
dictionary. A data dictionary may also be used to identify 
selected data items for binary-only relational databases that 

50 have no accessible data dictionary and for non-relational 
databases. 

Indexing may be accomplished by providing to a keyword 
search engine indexing agent both the textual representation 
of each selected item's data and the selected item's location 

55 identifier. The indexing agent produces an index that asso- 
ciates keywords with resource locators, and each resource 
locator includes a textual representation of a data item 
location identifier. Suitable indexing agents include web 
crawlers, indexing "bots", and other text indexing tools. 

60 Suitable resource locators include URLs, hot links, file 
paths, and distinguished names, object class names, table 
names, and primary database key values, among others. 

Users provide keywords to a search engine interface in a 
system according to the invention. The system uses the 

65 index to obtain a resource locator that is associated with the 
keyword. The resource locator is used to retrieve the item's 
current data from the structured database, using SQL queries 
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or other structured database retrieval mechanisms. A docu- familiar text editors emacs and vi. A keyword may contain 

ment containing the retrieved data, such as a web page, is smaller keywords connected by operators such as AND and 

then generated and provided to the user. OR. 

The invention bridges a gap between loosely structured One alternative to keyword searching is "browsing" 
textual keyword search information technologies, on the one 5 through the available data until values of interest arc located, 
hand, and highly structured relational/hierarchical query Browsing is available in most computer information man- 
language search database technologies, on the other. Web agement systems, regardless of whether keyword searches 
pages on the Internet or on an intranet are effective for are supported. An important difference between keyword 
textual information that is relatively static and unstructured, searching and browsing is that keyword searches focus 
such as press releases, user guides, policy statements, and *° much more quickly on portions of the data that are likely to 
procedure manuals. Other information, such as availability, be of interest. This is particularly true if the keyword search 
pricing, performance and planning records, is more dynamic is performed on data that is grouped by subject matter. For 
and has traditionally been maintained in highly structured instance, a search using the keyword "bat" in data classified 
databases such as relational or object-oriented databases. b y sub J ect matler could lead quickly to baseball statistics 

Hie invention makes it possible to use a single search 35 rat ^ r tha ° a discussion of flying mammals, 
method-keyword searching-to locate and retrieve desired ( Man y conventional structured database systems support 
information from different types of information sources. In WW searches through SQL or another query language, 
particular, the invention makes it possible to publish selected ^ ^portant difference between query searches and key- 
portions of a relational database in a manner that allows word ^ arch f » that ^ <* archcs normally presume the 
users to retrieve relational data without knowing details of 20 existence of relations or other structure in the data and 
the database's internal organization. Other features and ~ assumptions about that str^ 
advantages of the present invention will become more fully ^queries are ° f u th f fo ™ SELECT * ™™ Y WHERE 

apparent through the following description. * b ™V hCadmg Qame - ° f * CO u lumn 1 m * tM * 

called Y, and Z being some constraint on the values stored 

BRIEF DESCRIPTION OF THE DRAWINGS 25 in the column. Such a query will be rejected if no table 

named Y exists, or if Y exists but has no column named X. 

To illustrate the manner in which the advantages and By contrast, keyword searches typically assume nothing 

features of the invention are obtained, a more particular about the relationships or structures that may internally 

description of the invention will be given with reference to connect different instances of matching data. In particular, a 

the attached drawings. These drawings only illustrate 30 keyword search of a relational database according to one 

selected aspects of the invention and thus do not limit the embodiment of the present invention for a keyword K will 

invention's scope. In the drawings: identify all data values in the exposed portion of the database 

FIG. 1 is a diagram illustrating one of many networks that match K, regardless of the table names or column names 

suitable for use according to the present invention. being used. 

RG. 2 is a block diagram further illustrating components 35 Even if a particular relational database system supported 

of the network shown in FIG. 1 and other suitable systems queries such as SELECT ALL FROM ALL WHERE 

according to the invention. (ENTRY CONTAINS 'K'), this would not be equivalent to 

„~ ,. a , , ... , . . 1 f . , a system according to the invention which assists a keyword 

FIG. 3 is a flowchart illustrating methods of the present u r n 1 . u * c . u t .u 1 1L r 

invention search of all database records for matches to the keyword K. 

40 For instance, the internal indexing and retrieval mechanisms 

FIO. 4 is a data flow diagram illustrating components and in relalional databases are optimized for selecting and corn- 
methods of the present invention. bining rec0 rds in rows and columns and tables according to 

DETAILED DESCRIPTION OF THE the * WeU as tCSlin ? ValUC ^ 

PREFERRED EMBODIMENTS StramlS j t^se mechamsms arc not optimized for retrieving 

45 every data value and then testing it against the key. Also, 
The present invention relates to a method and system for web crawlers and other keyword index builders index all 
assisting keyword searches of highly structured data. Before data values supplied to them, while relational databases 
detailing the architecture of methods and systems according typically index only selected columns or rows. Finally, 
to the invention, the meaning of several important terms is indexes according to the invention will generally have a 
clarified. Specific examples are given to illustrate aspects of so much broader context or scope than an internal relational 
the invention, but those of skill in the art will understand that database index, involving not just a single relational data- 
other examples may also fall within the meaning of the terms base but many other information sources as well; this makes 
used. Some terms are also defined, either explicitly or the inventive indexes more useful with all-purpose or corn- 
implicitly, elsewhere herein. prehensive search efforts. 

Terminology 55 As used here, a "structured database" is a collection of 

As used here, a "keyword" search is a pattern-matching data items organized primarily by rules other than those 

search which tries to locate instances of digital data using a governing natural languages such as English. The data items 

key word or phrase. Many conventional web search engines may contain natural language text such as addresses or part 

support keyword searches. Keywords may contain wild- names in a relational database, but relations, tables, trees, or 

cards. For instance, if the question mark is used as a 60 other structures are the primary means of organization, 

wildcard capable of matching any single character and the Structured database operations aid decision-making by 

asterisk is used as a wildcard capable of matching any zero allowing users to combine individual data items in various 

or more characters, then the keyword "b?t*" would match ways, as illustrated in the SQL query above, 

the words "bat", "bet", "bit", "bot", "but", "battle", "bitten", Relational databases are one example of structured data- 

and "butane", among others. In some cases keywords may 65 bases; other examples include hierarchical, inverted-list, 

also contain regular expressions, such as the regular expres- object-relational, object -oriented, and flat-file databases, 

sions used in the familiar lexical analysis program lex or the Structured databases may be stored in a single location or 
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distributed between several machines. Regardless of the also be configured such thai the exposure definitions 204 

approach taken to storage, many structured databases can be identify the portions of the database 202 which should NOT 

accessed through a network. be exposed for keyword searching, if that approach is more 

As used here, "network" includes local area networks, efficient or convenient. In either case, the exposure defini- 

wide area networks, metropolitan area networks, and/or s »»ons 204 may be in the form of a data dictionary, particu- 

various "Internet" networks such as the World Wide Web, a larI y if lhe structured database 202 is a relational database, 

private Internet, a secure Internet, a value-added network, a However, the exposure definitions 204 may also take the 

virtual private network, an extranet, or an intranet. One of form ° f a schema particularly if the structured daUbase 202 

many possible networks suitable for use according to the ^* raiducal d ^ b ^ ° T ° lher dM by * 

JTr^nn l° Wn in ^ ^ CatCd ^ lh L7°Z 30 ^InThcillu^^ 

labeled 100. The network 100 includes a server 102 and afe creflted and edited ^ an administration tool 2 ()6. The 

several clients 104; other suitable networks may contain K)o] 2Q6 may Qperale by extracling lhe de f inilions 204 from 

other combinations of servers, clients, and/or peer-to-peer an existing dala dictionary or schema, or it may be necessary 

nodes, and a given computer may function both as a client t0 build lhe definitions from scratch by reverse engineering 

and as a server. The computers connected by a suitable 15 me data formats used in a binary-only structured database 

network may be workstations, laptop computers, discon- 202 and then generating a data dictionary or schema which 

nectable mobile computers, servers, mainframes, so-called can be edited to eliminate portions of the database 202 that 

"network computers" or "lean clients", personal digital should not be exposed. 

assistants, or a combination thereof. A document generator 208 generates documents 210 

The network may include communications or networking 20 which contain textual representations of the exposed data 

software such as the software available from Novell, values in the database 202. In one embodiment, the docu- 

Microsoft, Artisoft, and other vendors, and may operate me nt generator 208 generates a document, such as an HTML 

using TCP/IP, SPX, IPX, and other protocols over twisted page, for each table in a relational database 202, containing 

pair, coaxial, or optical fiber cables, telephone lines, the table's values in ASCII form, and then locates the 

satellites, microwave relays, modulated AC power lines, 25 document 210 at a Uniform Resource Locator (URL) cor- 

and/or other data transmission "wires" known to those of responding to the table's location in the database 202. For 

skill in the art. The network may encompass smaller net- instance, an HTML page containing the data values stored in 

works and/or be connectable to other networks through a a sales database table named "customers" might be gener- 

gateway or similar mechanism. ated and then stored at http://www.company.com/salesdb/ 

As suggested by FIG. 1, at least one of the computers is 30 customers.htm. 

capable of using a floppy drive, tape drive, optical drive, An indexing agent 212 reads the documents 210 and 

magneto-optical drive, or other means to read a storage generates entries in an index 214. Suitable indexing agents 

medium 106. A suitable storage medium 106 includes a 212 include web crawlers, spiders, indexing robots, and 

magnetic, optical, or other computer-readable storage device other indexing tools. The indexing agent 212 may be a 

having a specific physical configuration. Suitable storage 35 network-roaming agent, or it may be tied to one or a few 

devices include floppy disks, hard disks, tape, CD-ROMs, network sites. In one embodiment of the system 200, the 

PROMs, random access memory, and other computer sys- indexing agent 212 indexes every data value io each docu- 

tem storage devices. The physical configuration represents ment 210, not just "meta tag" or other values that may or 

data and instructions which cause the computer system to may not be representative of the actual database contents, 

operate in a specific and predefined manner as described 40 Unlike indexing processes running inside the structured 

herein. Thus, the medium 106 tangibly embodies a program, database 202, the indexing agent 212 does not rely heavily 

functions, and/or instructions that are executable by on assumptions about the database structure but merely 

computers) to assist keyword searches of structured data treats the documents 210 as sources of text which have little 

substantially as described herein. or no structure except that imposed by English or another 

Suitable software for implementing the invention is 45 natural language, 

readily provided by those of skill in the art using the A keyword search engine user interface 216 may be 

teachings presented here and programming languages and integral with the indexing agent 212, or it may be a separate 

tools such as Java, Pascal, C++, C, CGI, Perl, SQL, APIs, program provided by a separate vendor. The user interface 

SDKs, assembly, firmware, microcode, and/or other lan- 216 accepts keywords (possibly including wildcards) and 

guages and tools. 50 uses the index 214 and possibly other components of the 

Overview of Components system 200 to locate corresponding documents 210. 

An overview of the main components of the invention and Overview of Operation 
its environment is now given with reference to FIG. 2. A An overview of the operation of the system 200 is now 
system 200 according to the invention operates using the given, with reference to FIGS. 2 and 3. Four main steps are 
network 100 or another suitable computer system. A struc- 55 shown in FIG. 3: a data selecting step 300, an index allowing 
tured database 202 and corresponding exposure definitions step 302, a search performing step 304, and an index 
204 are part of the inventive system or accessible to the maintaining step 306. These steps may be grouped for ease 
inventive system 200. The structured database 202 includes of explanation into an indexing phase (steps 300, 302, and 
data items which have data values; suitable databases 306) and a searching phase (step 304). During the indexing 
include conventional relational databases and other conven- 60 phase, the index 214 is created or updated. During the 
tional structured databases with the associated database searching phase, the index 214 is used to respond to key- 
management system software. word searches directed at the database 202 (and often to 

The exposure definilions 204 identify the portion(s) of the other information sources as well). In practice, both phases 

structured database 202 that will be exposed to external may be happening simultaneously or in an interleaved 

keyword searches; the entire database 202 is typically 65 fashion. 

already searchable by SQL or other conventional query 'ITie selecting step 300 illustrated includes a structure 

means. Those of skill will appreciate that the system 200 can determining step 308 and a definition editing step 310. 
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During the determining step 310, the administration tool 206 claims, regardless of whether they are expressly described as 

determines what structures are being used in the structured optional in this Detailed Description. Steps may also be 

database 202. For instance, the tool 206 may read an existing repeated, or combined, or named differently. In one alter- 

data dictionary (sometimes called a "catalog") of a relational native embodiment, for instance, an "indexing" step 

database 202 or an existing schema for a hierarchical or 5 includes the step 318 of providing to the keyword search 

object-oriented database 202 and then identify the relations. engine indexing agent 212 both the textual representation of 

partitions, record types, data types, links, indexes, primary each selected item's data and the selected item's location 

database keys, and other structures used to organize the identifier. 

database 202. If no data catalog or schema exists, the tool During the data documenting step 316, the document 

206 may be used to assist one of skill in reverse engineering 10 generator 208 documents the selected data items by creating 

the structure definitions by examining the binary contents of at least one document outside the structured database 202; 

the database 202 together with display formats, the document(s) 210 contain a textual representation of each 

documentation, and any other available structural informa- selected item's data. The document may exist as a stream of 

tion. data in RAM or coming from a network or other connection. 

During the editing step 310, the exposure definitions 204 is The document may also be stored on disk as a file, but those 
are initially created and/or updated by the tool 206. Some of skill will appreciate that throughput generally increases 
embodiments favor ease of editing by closely modeling the when disk accesses are reduced or eliminated. An index such 
exposure definitions 204 after an existing data dictionary or as the index 214, a web crawler index, or an internal 
schema for each database 202, while others favor portability database 202 index, is not a suitable result of the document- 
in the document generator 208 by making all exposure 20 ing step 316. Rather, textual documents produced by the step 
definitions 204 for all databases 202 use a common format, 316 include plain text or word processor documents, as well 
such as a particular relational database data dictionary as markup language documents. 

format. Markup language documents use markup language for- 
In any case, the selecting step 300 selects at least one data mats such as Standard Generalized Mark-up Language 
item in the structured database 202, with each selected item 25 (SGML), which is specified in the 1986 International Stan- 
containing data and each selected item having a correspond- dards Organization Standard No. 8879. Familiar markup 
ing location identifier which identifies the item's location languages include HTML and XML. Other mark-up lan- 
within the structured database 202. Suitable location iden- guages are used in Folio infobases, Microsoft Word 
tifiers include table, row, and/or column names; unique documents, Corel WordPerfect documents, troff documents, 
relational data key values; paths, filenames, common names, 30 and various hyperlink and hypertext documents 
contexts, and/or distinguished names; offsets, pointers, and/ (MICROSOFT WORD and COREL WORDPERFECT arc 
or record numbers; pointer array or hash table indexes or marks of Microsoft and Corel, respectively). Mark-up lan- 
entry numbers; transaction numbers or sequence numbers; guages generally provide links which associate a particular, 
universal unique identifiers (UUIDs) or globally unique pre-selected location in a primary text file with additional 
identifiers (GUIDs); and combinations of such identifiers. 35 text, images, or other information, or with links to email, 
The name or location of the database 202 may be part of a display, or other software. 

suitable location identifier, but merely identifying the data- In one embodiment, documents 210 produced with the 

base 202 is not sufficient. step 316 include a comprehensive textual representation of 

The allowing step 302 illustrated includes a definition each selected item's data. "Comprehensive" means that 

reading step 312, a data reading step 314, a data document- 40 every data value, or at least substantially every data value, 

ing step 316, a providing step 318, and an associating step appears separately in the documents 210. Every exposed 

320. During the definition reading step 312, the document data value that might reasonably be used as a keyword 

generator 208 reads the exposure definitions 204 and builds should appear in the documents 210. Merely listing table, 

or locates a checklist that will be used to make sure all row, column, partition, subtree, or other group names is not 

selected data is exposed for indexing. 45 sufficient, although these may be treated as data values and 

During the data reading step 314 the document generator placed in the documents 210. Nor is it adequate to summa- 

208 reads the selected data from the database 202. Data rize data or to select a relatively small sampling of "repre- 

reads may be performed directly from the binary database sentative" or "boundary" or "central" data values. 

202 using low-level file system commands, but it may be However, common terms such as "a", "the", "not" and the 

better to retrieve the data using the using the SQL interface, 50 like may be omitted from a comprehensive representation of 

application program interface (API), or other existing data data values to conserve space and improve keyword search 

retrieval software of the database 202. Data reads may be efficiency. Also, comprehensiveness may be with respect to 

done all at once, but more often the data reading step 314 all selected (exposed) data values, or merely with respect to 

and the data documenting step 316 will be repeated in pairs, non-numeric exposed data values or some other efficiency 

so that a chunk of data is read and then documented, the next 55 grouping. For instance, a comprehensive index may include 

chunk of data is read and documented, and so forth until all all selected data values for part numbers and customer 

selected data is documented. Of course, the providing step names but exclude prices and dates in the selected data 

318 and the associating step 320 may also be made part of items. 

the loop, so that each chunk of data is indexed before the During the providing step 318, the location of selected 

next chunk is read. 60 data in the database 202 and the textual representation of the 

More generally, FIG. 3 shows a particular order and selected data's values are provided to the indexing agent 

grouping for the main steps 300 through 306 and for various 212. If the agent 212 is a roaming agent, such as a web 

subsidiary steps. However, those of skill in the art will crawler, this may be accomplished by storing the documents 

appreciate that the steps illustrated and discussed here may 210 in files having names that contain the database locations 

be performed in various orders, except in those cases in 65 of the documented data and then making the files accessible 

which the results of one step are required as input to another for indexing by the crawler. For instance, an HTML docu- 

step. Likewise, steps may be omitted unless called for in the ment 210 containing the textual representation of data values 
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stored in a database 202 table named "customers" could be 
stored in a tile named "customers.htm", or an XML docu- 
ment 210 containing the textual representation of data stored 
in an object database 202 could be stored in a flic whose path 
name includes a class identifier, file type, and GUID, such as 5 
i 7OLE/dll/42754580-16b7-llce-80eb-00aa003d7352". If 
the agent 212 does not roam the system 200, then steps must 
be taken to bring the agent 212 together with the paired 
locations and textual data, such as by providing the pairs 
directly or indirectly as command line parameters or as 10 
interactive input to the agent 212. 

During the associating step 320, the agent 212 associates 
the textual data values with their paired location(s) in the 
index 214, treating the data values as keywords. That is, the 
associating step 320 indexes the documents 210 by creating is 
or updating the index 214 (which resides outside the data- 
base 202) so that the index 214 associates keywords in the 
textual representation of each selected item's data with that 
item's location identifier. 

The index 214 and the indexing agent 212 may use 20 
B-trees, hashing, and other familiar data structures and 
operations to create or modify or extend the index 214. If the 
documents 210 are in HTML format and the agent 212 is a 
web crawler that only indexes meta content tag values then 
comprehensive indexing places all (or substantially all) data 25 
values in the meta content tags so they will be indexed by the 
agent 212. 

In one embodiment, the agent 212 produces an index 214 
that associates keywords with resource locators, and each 
resource locator includes a textual representation of a data 30 
item location identifier. Suitable resource locators include 
URLs (including hot links), file names, file path names, 
GUIDs, distinguished names, database key values, object or 
class or table or column names, and other resource identi- 
fiers. 35 

A major advantage of the present invention is that the 
index 214 will tend to contain entries for data sources other 
than the database 202, unlike the internal database 202 
indexes. For instance, the index 214 may associate keywords 
with storage locations in multiple relational and other 40 
databases, web sites, file systems, word processor document 
management systems, Lotus Notes (mark of IBM) 
databases, Microsoft Exchange (mark of Microsoft) 
databases, and other data sources. 

Moreover, adding structured database 202 values to an 45 
existing index 214 with the invention leverages the existing 
values in the index 214, the existing indexing capability of 
the agent 212, existing search engine interfaces 216, and 
existing document 210 formats. The invention extends these 
capabilities, rather than attempting to replace them by fore- 50 
ing use of yet another closed, proprietary data format. 

The keyword search performing step 304 illustrated 
includes a keyword obtaining step 322, an index using step 
324, a retrieving step 326, a documenting step 328, and a 
transmitting step 330. During the keyword obtaining step 55 
322, the user interface 216 obtains a keyword from a user. 
The user may be a human, or it may be a task, thread, or 
other computer process. The keyword may be a single word, 
a portion of a word with one or more with wildcards, or a 
combination of such words. Combinations are formed using 60 
familiar text search operators such as And, Or, But Not, 
Within N Words, Within Same Sentence, and the like. 
Keyword searches may be performed in the context of 
subject matter, chronological, or field scope constraints. 

During the index using step 324, the search engine 216 65 
uses the index 214 to obtain the localion(s) of instances thai 
match the keyword. Although an integrated interface and 



search engine 216 is illustrated, in other embodiments the 
index -using search engine is separate from the user interface 
and may even accept keyword searches from several differ- 
ent user interfaces. Familiar pattern-matching and lookup 
techniques, such as those currently available through 
Yahoo!, Digital Alia Vista, Infoseek, and Excite web sites 
(marks of their respective owners) and other keyword search 
engines may be used during the step 324. 

During the retrieving step 326, documents 210 containing 
instances of the keyword may be supplied to the search 
engine 216 for transmission to the user; no is documents are 
supplied if no matches are found. The documents 210 may 
have been created during the documenting step 316 as part 
of the indexing phase, or they may be created in response to 
the keyword search being performed during the step 304. 

In the latter case, the search engine 216 and the document 
generator 208 use the location information obtained from the 
index 214 to retrieve data values from the structured data- 
base 202 and then create corresponding documents 210 
during the step 328. In one embodiment, only the individual 
data values that match the keyword and reside in the selected 
data items are retrieved. In another embodiment contextual 
information, such as nearby data values or table names, is 
also retrieved and documented. Retrieval during the step 326 
may otherwise proceed generally as discussed in connection 
with the data reading step 314 above. The documenting step 
328 may proceed generally as discussed in connection with 
the documenting step 316 above. 

The step 330 may send documents 210 to the user 
interface 216 to be displayed on a screen as part of a 
graphical user interface, stored in a file, or otherwise used. 
The documents 210 may be summarized, compressed, 
encrypted, translated, or otherwise manipulated before, 
during, or after their transmittal. 

The index maintaining step 306 proceeds generally like 
the allowing step 302, except that only some of the selected 
data items are indexed. For instance, a log of changes to the 
structured database 202 may be maintained by the database 
202 or by the administration tool 206, so that only data 
values that may have changed are re-indexed. 
Additional Examples 

FIG. 4 illustrates further the components, environment, 
and operation of one embodiment of the invention; reference 
is also made to the earlier figures. FIG. 4 provides one of 
many possible examples; steps and/or components may be 
added, omitted, re-ordered, and/or performed concurrently 
in other embodiments according to the invention. 

During the indexing phase, a database administrator 400 
performs the editing step 310 by using the administration 
tool 206 to create exposure definitions 204 in the form of 
data dictionary definitions 402. A pre-existing data dictio- 
nary 404 defines the structure of the entire database 202; the 
exposure definitions 204 divide the data into a portion 406 
which is exposed for indexing and a portion 408 which will 
not be indexed into the index 214. The data dictionary 402 
may also be used to associate selected classes with specific 
tables or views, to associate default named attributes and 
attribute types with each selected table column, and to assist 
operations such as data type conversion and output format- 
ting. 

During the definition reading step 312, a combination 
database reader and page generator 410 (which act as the 
document generator 208) reads the data dictionary 402 to 
identify the portion of the database 202 that will be exposed 
to a web crawler 412 (which acts as the indexing agent 212). 
If the administrator 400 wishes to create a virtual record that 
is the join of several tables so that users 420 receive 
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additional context in search results, the administrator 400 
can use the tool 206 and the dictionary 402 to do so, and the 
database reader 410 will treat the resultant join as a com- 
posite record. 

During the data retrieving step 314, the database reader 
410 creates SQL queries 414 which will extract the exposed 
data 406, queries the database 202, and buffers the extracted 
data 406. During the documenting step 316, the page gen- 
erator 410 creates HTML pages 416 containing the extracted 
data 406. The URL associated with each HTML page 416 
includes a textual representation of the location in the 
database 202 from which the data represented in the page 
416 was extracted. 

During the providing step 318, the HTML pages 416 are 
made accessible to one or more web crawlers 412, along 
with the corresponding URLs generated by the page gen- 
erator 410. During the associating step 320, the web crawler 
412 reads the HTML pages 416 and creates or updates an 
index 418. This concludes the indexing phase, or at least the 
first iteration of the indexing phase; subsequent indexing 
may be interleaved with keyword searches or performed 
concurrently with such searches. 

In the search phase, during the keyword obtaining step 
322 a user 420 enters a keyword search 422 into a web or 
Internet or intranet search engine 424. During the step 324, 
the search engine 424 uses the crawler index 418 to generate 
search results that (for purposes of illustration we will 
assume) contain URLs generated by the page generator 410. 
During one version of the retrieving step 326, the corre- 
sponding pages 416, which were generated during the index- 
ing phase, are then supplied to the search engine 424 for 
transmittal to the user 420. The search phase may end at this 
point. 

However, during another version of the retrieving step 
326, the user 420 may also request (implicitly or expressly) 
additional detail about a keyword search result whose URL 
was generated by the page generator 410, or the most current 
possible results. In response, the search engine 424 asks a 
web page server 426 for the HTML page located at the URL. 
The web server 426 asks the database reader 410 for the 
HTML page. The database reader 410 uses the data dictio- 
nary 402 to formulate a SQL query 414 for the correspond- 
ing current data, based on the data location information 
embedded in the URL. The database reader 410 accepts the 
SQL query response and buffers it. During the step 328, the 
page generator 410 creates detail HTML pages 428 contain- 
ing the current data provided in the SQL query response. 
Finally, during the transmitting step 330, the page generator 
410 makes the detail HTML pages 428 accessible to the web 
page server 426, which passes the detail HTML pages 428 
to the search engine 424, which displays the detail HTML 
pages 428 to the user 420. 

In one alternative embodiment, the structured database 
202 includes data items organized as records in relations 
f according to the data dictionary 404, the selecting step 300 

includes the step of providing the supplemental data dictio- 
nary 402 which identifies selected records or tables, and the 
indexing step 320 only indexes records and tables that are 
identified by the supplemental data dictionary 402. 

In some embodiments, the computer system 200 includes 
a selecting means for selecting data items in the structured 
database 202. Suitable selecting means include the exposure 
definitions 204 and/or 402, an exposure definition schema 
defining exposed elements of the database 202, the admin- 
istration tool 206, software and/or hardware implementing 
the selecting step 300, and/or other selecting means, in 
appropriate combinations. 
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In some embodiments, but particularly if the structured 
database 202 includes a relational database and the data 
items include relational database records or tables, the 
selecting means includes the selection data dictionary 402 

5 which specifies only selected relational database records or 
tables. The data dictionary 402 may be used when other 
definitions 404 are present, or when they are not, and may 
be used even if the database 202 is not entirely relational. 
The system 200 also includes a retrieving means for 

10 retrieving from the database 202 the current data of a 
selected data item, such as the document generator 208, 
search engine 424, database reader 410, document server 
426, software and/or hardware implementing the retrieving 
step 326, and/or other retrieving means, in appropriate 

is combinations. 

In addition, the system 200 includes an exposing means 
for exposing to the indexing agent 212 information about a 
data item's location in the database 202 together with 
information about the data item's retrieved data. Suitable 

20 exposing means include the document generator 208, page 
generator 410, documents 210 and/or 416 and/or 428, soft- 
ware and/or hardware implementing the documenting step 
316 or providing step 318, means for invoking the agent 212 
or crawler 412, and/or other exposing or documenting 

25 means, in appropriate combinations. 

In one embodiment, the search engine interface 216 and 
the retrieving means reside on different nodes in the network 
100 and communicate with one another using a TCP/IP 
network protocol. In another embodiment, communication 

30 is accomplished using an IPX network protocol. 

In one embodiment, the administration tool 206 and other 
system 200 components are compatible with widely used 
commercial operating system, networking, and database 
management software and systems, and include a user 

35 interface designed to prevent confusion by limiting admin- 
istrator 400 access to one set of exposure definitions 204 at 
a time. For instance, one embodiment supports the data 
dictionary 404 table layouts for major commercial database 
vendors such as Oracle, SQL Server, Sybase, and Informix. 

40 Different database vendors may have different names for 
different data types, so all types in the data dictionary 404 
are coerced into one of the following types: Date; Number 
(includes at least Integer, Real, Float); and Char (includes at 
least VarChar2, Long). 

45 At least initially, implementation may be eased by not 
supporting RAW or BLOB data types, but support for these 
and other types is included in alternative embodiments of the 
invention. Likewise, both textual and relational/structured 
information stores are becoming better adapted for use with 

50 graphical and audible data, such as static images, video 
clips, and audio files. Terms such as "textual" and "data 
value" used herein should be understood to include such 
digital forms of multi-media and audiovisual information. 
The capabilities available through this embodiment of the 

55 tool 206 in an "Admin" menu include: New (start new 
exposure definitions 204); Open (open existing set of expo- 
sure definitions 204 for review and possible editing or 
copying); Save or Save As (save exposure definitions 204 in 
a file); Project (edit configuration values such as database 

60 202 name, database 202 user ID and password); Generate 
(generate an HTML index file and HTML template files for 
each object class in the target directory for a currently open 
set of exposure definitions 204); Initialize (drop and create 
database dictionary tables in the current database 202 

65 account); and Exit. 

In this embodiment, information needed to connect the 
tool 206 to the database 202 includes: a file name (full path) 
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for the exposure definitions 402 and other configuration 
values; directory location(s) for HTML output template 
files; a database name (displayed at top of every output 
HTML page 210 in case multiple databases are crawled and 
indexed together); and a database user ID, password, and 
connection string (used by. the tool 206 and the database 
reader 410 to iog into and read the database 202). In one 
alternative embodiment, the information provided to the tool 
206 also includes a directory location for an HTML index 
file 214. 

The capabilities available through this embodiment of tiie 
tool 206 in an "Objects" menu include: Object Screen (list 
of database 202 user names populated on entry leads to list 
showing tables and views owned by selected user and object 
class information defined for each table); Attribute Sub- 
Screen (column names for table are queried and displayed; 
for newly defined objects with no existing attribute records, 
the column names are inserted in data dictionary first and 
then queried; by default, attributes arc populated such that 



include the primary key column(s) of the object class, an 
action code column (Add, Update, or Delete), and a times- 
tamp column. 

In one embodiment, the database reader 410 includes a 
5 query interface and the system 200 operates as follows. After 
the user 420 queries records in the crawler index 418, the 
user 420 seeks the current detailed database record. After 
selection of the hot link to the record, the database reader 
410 queries the target table according to the location param- 
10 eters in the hot link, which are the object class name and the 
primary database key values. The database reader 410 
buffers the record and invokes the page generator 410, and 
the HTML text is sent back to the user 420 as previously 
described. 

is In addition, the following capabilities are provided in 
some embodiments of the database reader 410. Column 
level stored functions are defined at the domain or attribute 
level which allow the value of a database 202 column to be 
modified at query time. Input parameters for a domain level 



attribute labels are same as column name, sequence is same 20 stored function include the column value and domain ID, 



as column sequence, display flag is on, primary key flag is 
off, character data types are given an HTML string tag and 
domain Text, number data types are given an HTML 
numeric tag and domain Number (9999), and no units are 



and input parameters for an attribute level stored function 
include the column value, attribute ID, and row ID of the 
database 202 record. An output format mask is provided for 
numeric and date column data types. Unit scale conversions 



initially assigned); Object Detail Sub-Screen (object details 25 are supported. Multicolumn primary database keys for 



object classes and object details are supported. Finally, 
support is provided for managing multiple object classes and 
their detail records which are children of a parent object 
class record. 

In one embodiment, the page generator 410 operates such 
that all database 202 column output is converted to ASCII or 
another character format and displayed according to the 
HTML template page for the particular object class 
involved. The format specification for template fields is in 



queried and displayed on entry; new object details may be 
defined by selecting from a list of currently defined object 
classes); Object Detail Attributes Sub-Screen (defines 
attributes for object detail, similarly to Attribute Sub-Screen, 

except that join conditions between object detail and object 30 
class must be defined, as by selecting attributes from lists in 
current object class and object detail). 

The capabilities available through this embodiment of the 
tool 206 in a "Domains" menu include a Domain Screen. On 

entry, a list is populated with the domain names currently 35 the form <object__class_name>.<attribute_Jabel>. The 

defined. As a domain is selected, the field values are dis- name format for HTML template files is <object_class_ 

played. The administrator 400 can add, update, and delete table namo_tmplt.htm. Object class and database 202 

domain field values. By default, the following domains name are displayed at the top of the generated page 416. 

should be defined on creation of a data dictionary 402: Text Field alignment is center, right, or left, with left justification 

(tagged as a key identifier), Text (plain), Number (9999), 40 being the default. 

Number (9,999), Money ($9.99), Money ($9), Percent (9%), In summary, the present invention provides a novel sys- 

Percent (9.9%), Percent (9.99%), Date (MM/DD/YY), Date tern and method for making structured database contents 

(DD-MON-YY). available through keyword searches. By making it possible 

The capabilities available through this embodiment of the to use web crawler indexes to locate relational database 
tool 206 in a "Units" menu include a Units Screen. On entry 45 records and object-oriented database objects as well as word 
a list is populated with the unit types currently defined, As processed documents and web pages, the invention reduces 
a unit type is selected, the fields are displayed along with the complexity and inefficiency of searches spanning net- 
related units child records. The administrator 400 can add, erogeneous data sources. Moreover, the invention leverages 
update, and delete unit field values. existing information and technology resources instead of 

In one embodiment, the database reader 410 includes a 50 requiring users to adopt expensive new systems that are not 

crawler interface and the system 200 operates as follows. compatible with existing resources. 

The crawler 412 crawls an URL for an index page 416 Although particular methods embodying the present 

containing a list of hot links to all selected object classes. As invention are expressly illustrated and described herein, it 

the crawler follows the link from the index page 416 for each will be appreciated that apparatus and article embodiments 

object class, the database reader 410 retrieves the corre- 55 may be formed according to methods of the present inven- 



sponding record from the database 202 and feeds matching 
HTML text to the crawler 412 for indexing. HTML pages 
representing retrieved data arc generated by the page gen- 
erator 410. 



tion. Unless otherwise expressly indicated, the description 
herein of methods of the present invention therefore extends 
to corresponding apparatus and articles, and the description 
of apparatus and articles of the present invention extends 



The crawler 412 can work in two modes. In a Full Scan 60 likewise to corresponding methods. 



Mode, all selected records of the table are crawled and 
indexed. In an Update Only Mode, only records which have 
been added, updated, or deleted are retrieved and crawled. 
Updated records can be identified by logging them in a 



The invention may be embodied in other specific forms 
without departing from its essential characteristics. The 
described embodiments are to be considered in all respects 
only as illustrative and not restrictive. Any explanations 



transaction table for the object class with their primary 65 provided herein of the scientific principles employed in the 
database key and a timestamp. The log must be updated as present invention are illustrative only. The scope of the 
logged records are crawled. Transaction table columas invention is, therefore, indicated by the appended claims 
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rather than by the foregoing description. All changes which 
come within the meaning and range of equivalency of the 
claims are to be embraced within their scope. 

What is claimed and desired to be secured by patent is: 

1. A method supporting keyword searches of data items in s 
a structured database, the method comprising the computer- 
implemented steps of: 

selecting at least one data item in the structured database, 
each selected item containing data and each selected 
item having a corresponding location identifier which 10 
identifies the item's location within the structured data- 
base; 

documenting the selected data items by creating at least 
one document outside the structured database which 
contains a textual representation of each selected item's 15 
data; and 

indexing the documents by creating an index outside the 
database, the index associating keywords in the textual 
representation of each selected item's data with that 
item's location identifier, 20 

wherein the structured database includes data items orga- 
nized as records in relations according to a data 
dictionary, the selecting step includes the step of pro- 
viding a supplemental data dictionary which identifies 25 
selected records or tables, and the indexing step only 
indexes records and tables that are identified by the 
supplemental data dictionary. 

2. The method of claim 1, wherein the indexing step 
includes providing to a keyword search engine indexing 3Q 
agent both the textual representation of each selected item's 
data and the selected item's location identifier. 

3. The method of claim 2, wherein the indexing agent 
produces an index that associates keywords with resource 
locators, and each resource locator includes a textual rep- 35 
resentation of a data item location identifier. 

4. The method of claim 3, wherein the resource locator 
includes an URL. 

5. The method of claim 3, wherein the resource locator 
includes a file path. 40 

6. The method of claim 3, wherein the textual represen- 
tations are comprehensive with respect to the data values of 
selected data items. 

7. The method of claim 1, wherein the creating step 
creates an index containing keywords that are textual rep- 45 
rcscntations of data in the selected data items. 

8. The method of claim 7, wherein the creating step 
creates an index containing keywords that are textual rep- 
resentations of non-numeric data in the selected data items. 

9. A method supporting keyword searches of data items in 5Q 
a structured database, the method comprising the computer- 
implemented steps of: 

selecting at least one data item in the structured database, 
each selected item containing data and each selected 
item having a corresponding location identifier which 55 
identifies the item's location within the structured data- 
base; 

documenting the selected data items by creating at least 
one document outside the structured database which 
contains a textual representation of each selected item's 60 
data; and 

indexing the documents by creating an index outside the 
database, the index associating keywords in the textual 
representation of each selected item's data with that 
item's location identifier, 65 

wherein the indexing step includes providing to a key- 
word search engine indexing agent both the textual 
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representation of each selected item's data and the 
selected item's location identifier, the indexing agent 
produces an index that associates keywords with 
resource locators, each resource locator includes a 
textual representation of a data item location identifier, 
and the resource locator includes a distinguished name. 

10. A method supporting keyword searches of data items 
in a structured database, the method comprising the 
computer-implemented steps of: 

selecting at least one data item in the structured database, 
each selected item containing data and each selected 
item having a corresponding location identifier which 
identifies the item's location within the structured data- 
base; 

documenting the selected data items by creating at least 
one document outside the structured database which 
contains a textual representation of each selected item's 
data; and 

indexing the documents by creating an index outside the 
database, the index associating keywords in the textual 
representation of each selected item's data with that 
item's location identifier, 

wherein the creating step creates an index containing 
keywords that are textual representations of data in the 
selected data items and also containing every keyword 
that is a textual representation of data in the selected 
data items. 

11. A method supporting keyword searches of data items 
in a structured database, the method comprising the 
computer-implemented steps of: 

selecting at least one data item in the structured database, 
each selected item containing data and each selected 
item having a corresponding location identifier which 
identifies the item's location within the structured data- 
base; 

documenting the selected data items by creating at least 
one document outside the structured database which 
contains a textual representation of each selected item's 
data; 

indexing the documents by creating an index outside the 
database, the index associating keywords in the textual 
representation of each selected item's data with that 
item's location identifier; and 

logging changes that are made to data items after the 
creating step and then updating the index to reflect at 
least some of the changes. 

12. A method supporting keyword searches of data items 
in a structured database, the method comprising the 
computer-implemented steps of: 

selecting at least one data item in the structured database, 
each selected item containing data and each selected 
item having a corresponding location identifier which 
identifies the item's location in the structured database; 

allowing a network-roaming indexing agent to create an 
index which associates keywords with resource 
locators, each keyword being a textual representation 
of data from a selected data item and each resource 
locator containing a textual representation of the cor- 
responding selected item's location identifier; 

obtaining a keyword from a search engine interface; 

using the index to obtain a resource locator associated 
with the keyword; and then 

using the resource locator to retrieve the item's current 
data from the structured database. 

13. The method of claim 12, wherein the resource locator 
includes an URL. 
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14. The method of claim 12, wherein the allowing step 
reads a data dictionary which identifies only the selected 
data items. 

15. The method of claim 12, wherein the allowing step 
includes reading data from data items which arc records in 5 
a relational database. 

16. The method of claim 12, wherein the allowing step 
includes reading data from data items which are nodes in a 
hierarchical database. 

17. The method of claim 12, wherein the allowing step 10 
includes reading data from data items which are objects in 

an object-oriented database. 

18. The method of claim 12, wherein the step of using the 
resource locator comprises extracting a data item's location 
identifier from the resource locator, and then using the 15 
location identifier to retrieve the item's current data. 

19. The method of claim 12, wherein the step of using the 
resource locator includes generating a request to retrieve the 
item's current data from the database. 

20. The method of claim 19, wherein the request includes 20 
an SQL query. 

21. The method of claim 12, further comprising the 
computer-implemented step of generating a textual docu- 
ment containing the retrieved data. 

22. The method of claim 21, wherein the document is 25 
generated in a markup language format. 

23. The method of claim 22, wherein the document is 
generated in HTML format. 

24. A computer storage medium having a configuration 
that represents data and instructions which will cause at least 30 
a portion of a computer system to perform method steps for 
supporting keyword searches of data items in a structured 
database, the method steps comprising the steps of claim 13. 

25. The storage medium of claim 24, wherein the method 
steps comprise the steps of claim 15. 35 

26. The storage medium of claim 24, wherein the method 
steps comprise the steps of claim 19. 

27. The storage medium of claim 24, wherein the method 
steps comprise the steps of claim 20. 

28. The storage medium of claim 24, wherein the method 40 
steps comprise the steps of claim 22. 

29. A computer system comprising: 

selecting means for selecting data items in a structured 
database; 

retrieving means for retrieving from the database the 45 
current data of a selected data item; and 

exposing means for exposing to an indexing agent infor- 
mation about a data item's location in the database 
together with information about the data item's 5Q 
retrieved data, 

wherein the structured database includes a relational 
database, the data items include relational database 
records or tables, and the selecting means includes a 
selection data dictionary which specifies only selected 
relational database records or tables. 



30. The system of claim 29, wherein the selecting means 
includes a schema defining elements of the structured data- 
base. 

31. The system of claim 29, further comprising an admin- 
istration tool for modifying the selecting means. 

32. The system of claim 31, wherein the selecting means 
includes a selection data dictionary which specifies only 
selected relational database records or tables, and the admin- 
istration tool is capable of creating and modifying the 
selection data dictionary. 

33. The system of claim 29, wherein the retrieving means 
includes a database reader capable of generating requests to 
retrieve data from the structured database. 

34. The system of claim 33, wherein the database reader 
is capable of generating SQL queries. 

35. The system of claim 29, further comprising the 
indexing agent. 

36. The system of claim 35, wherein the indexing agent 
includes a web crawler. 

37. The system of claim 29, further comprising a search 
engine interface. 

38. The system of claim 37, wherein the search engine 
interface and the retrieving means reside on different nodes 
in a network. 

39. The system of claim 38, wherein the search engine 
interface and the retrieving means communicate with one 
another using a TCP/IP network protocol. 

40. The system of claim 38, wherein the search engine 
interface and the retrieving means communicate with one 
another using an IPX network protocol. 

41. The system of claim 29, further comprising an index 
produced by the indexing agent. 

42. The system of claim 41, wherein the index contains 
keywords and corresponding resource locators for both the 
structured database and a textual document information 
source residing at a different network location than the 
structured database. 

43. The system of claim 41, wherein the index contains 
keywords and corresponding resource locators for at least 
two structured databases residing at different network loca- 
tions. 

44. A computer system comprising: 

selecting means for selecting data items in a structured 
database; 

retrieving means for retrieving from the database the 
current data of a selected data item; and 

exposing means for exposing to an indexing anent infor- 
mation about a data item's location in the database 
together with information about the data item's 
retrieved data, wherein the exposing means includes a 
page generator capable of generating a textual docu- 
ment containing the retrieved data. 

45. The system of claim 44, wherein the page generator is 
capable of generating an HTML page containing the 
retrieved data. 



01/30/2004, EAST version: 1.4.1 



